-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PTM Stoichiometry #797
base: master
Are you sure you want to change the base?
PTM Stoichiometry #797
Conversation
|
||
// get the localized modifications from the peptide full sequence and add any amino acid/modification combination not | ||
// seen yet to the occupancy dictionary | ||
foreach (KeyValuePair<int, List<string>> aaWithModList in peptideMods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In situations like this, you can use "var aaWithModList" instead of specifying the actual class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think readers/Quant... is the best place for it. That way it can be used to find occupancy of the results from another software should that be desired.
In order to optimize your inputs and outputs of the function, you should break your test method into two. One test method with reads in all the data you need. Another method (not a test method) that gets called to calculate the occupancy. This will help you to better understand what is needed for the method, and for use to help make recommendations
Requesting a second round of reviews! The second to last commit contains a little more in detail most changes. Currently pending work is to create a small enough subset of the raw data to create a test similar to the I'd be happy to hear about 1) code optimization, 2) currently written tests, and 3) clarifications on code commenting. In a conversation, Nic suggested using objects for my main ptm calculation code rather than the 5-level deep dictionary, thoughts on that would be useful as well. Ofc, anything else is useful. TIA! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #797 +/- ##
==========================================
+ Coverage 77.78% 77.85% +0.06%
==========================================
Files 230 230
Lines 34152 34307 +155
Branches 3538 3564 +26
==========================================
+ Hits 26566 26710 +144
- Misses 6983 6992 +9
- Partials 603 605 +2
|
{ | ||
// use a regex to get all modifications | ||
string pattern = @"\[(.+?)\]"; | ||
Regex regex = new(pattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure that this method never thinks that
[hydroxylation]EPT[phospho] is accidentaly identified as a mod for P[hydroxylation]EPT[phospho]IDE
I'm not sure that ]EPT[ won't be ignored by your regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After finding an opening bracket, regex will always find the next closing bracket, except (updated now) in the case where the closing bracket belongs to an ion charge state.
namespace MzLibUtil | ||
{ | ||
// Should this have all of the parent data (i.e. protein group, protein, peptide, peptide position)? Unnecessary for now, but probably useful later. | ||
public class UtilModification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UtilModification => LocalizedModificationFromTsv
modName => IdWithMotif
position =>PeptidePositionZeroIsNterminus
{ | ||
public string FullSequence { get; set; } | ||
public string BaseSequence { get; set; } | ||
public UtilProtein ParentProtein { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be ProteinGroup?
} | ||
} | ||
|
||
public class UtilProtein |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flashlfq proteingroup
…ve amino acid positions depending on the length for the modification string and its index. Current approach fixes that.
…sitionFrequencyAnalysis UtilProtein class (now updates peptide mod positions to protein positions) and PFA argument (list of named tuple for clarity)
…ing to master and matching content
Some pending changes: Side Note: |
542f959
to
0184816
Compare
…d of FlashLFQ to output occupancy. Updated UtilClasses for correct UtilProtein.ModifiedAminoAcidPositionsInProtein positions.
Creating a mzLib method to calculate the stoichiometry (or site-occupancy) of PTMs using the intensity of each quantified peak. The current inputs are the protein database(s) file(.xml) paths and the AllQuantifiedPeaks.tsv file path. The output,
occupancyDict
, is currently a dictionary of nested dictionaries with the following structure:where
PROTEINX
is the protein accession,MAAX
is the modified amino acid at protein position X, andMODNAME1
is the full label of the modification. For eachMAAX
, there is a"Total"
key (instead of a modification name) that holds the total intensity of that amino acid measured in the quantified peaks file, including modified and unmodified peptides with that specific residue.The general approach is to first get all of the modification intensities and record those in
occupancyDict
while storing inproteinSeqRangesSeen
a dictionary with protein accession keys and values stored as a list of(STARTINDEX, ENDINDEX, INTENSITY)
tuples. This helps keep track of the index ranges seen for each protein. Once we have parsed all of the mods, for every amino acid falling into any of those ranges, we increase its"Total"
intensity by that amount.From our discussion, I've added below some of the items I'd like to get some opinions about. Imade them a task list primarily for me to keep track of what I've figured out.
FlashLFQResults
andReaders/QuantificationResults
.Thanks in advance!